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Abstract 



Tree detection techniques are often used to reduce the complexity of a posteriori probability 
(APP) detection in high dimensional multi-antenna wireless communication systems. In this paper, 
' we introduce an efficient soft-input soft-output tree detection algorithm that employs a new type of 

look-ahead path metric in the computation of its branch pruning (or sorting) . While conventional 
path metrics depend only on symbols on a visited path, the new path metric accounts for unvisited 
parts of the tree in advance through an unconstrained linear estimator and adds a bias term that 
reflects the contribution of as-yet undecided symbols. By applying the linear estimate-based look- 



> 

(N 

OO ' ahead path metric to an A/-algorithm that selects the best M paths for each level of the tree we 



develop a new soft-input soft-output tree detector, called an improved soft-input soft-output M- 



^) • algorithm (ISS-MA). Based on an analysis of the probability of correct path loss, we show that the 

improved path metric offers substantial performance gain over the conventional path metric. We 



also demonstrate through simulations that the ISS-MA provides a better performance-complexity 
trade-off than existing soft-input soft-output detection algorithms. 

I. Introduction 

The relationship between the transmitted symbol and the received signal vector in many commu- 
nication systems can be expressed in the form 

yo = Hx-(-no, (1) 

where x is the x 1 transmitted vector whose entries are chosen from a finite symbol alphabet, jo 
and rio are the L x 1 received signal and noise vectors, respectively, and H is L x channel matrix. 

J. W. Choi and A. C. Singer are with Dept. of Electrical and Computer Engineering, University of Illinois at 
Urbana-Champaign. B. Shim is with EECS Dept., Korea Univ., Seoul, Korea. 
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As a practical decoding scheme when a code constraint is imposed, iterative detection and decoding 
(IDD) has been apphed to various digital communication systems including channel equalization 
[Tj, multi-input multi-output (MIMO) detection [2]-[4j, and multi-user detection [S]. Motivated by 
the turbo principle [6J , an IDD receiver exchanges soft information between a symbol detector and a 
channel decoder to achieve performance close to the channel capacity. The symbol detector computes 
a posteriori probabilities (APP) on the bits comprising x, using a priori probabilities provided by 
the channel decoder and the observation Yq. Then, the detector exchanges this soft information 
(so called extrinsic information) with a soft-input soft-output decoder, such as the max-log-MAP 
decoder [7]. In the sequel, we refer to such a detector as an APP detector. 

Direct computation of the APP involves marginalization over all configurations of the vector x, 
leading to exponential complexity in the system size (e.g., number of antenna elements in MIMO 
systems). As a means of approximately performing the APP detection at reduced complexity, tree 
detection techniques have received much attention recently [1], [5]-[T2]. (Refer to [17j for an 
overview of tree detection techniques.) The essence of these approaches is to produce a set of 
promising symbol candidates via a tree search for estimating the APP over this reduced set. Thus 
far, a variety of tree detection algorithms have been proposed. In [3j, the sphere decoding algorithm 
(SDA) [ISj, [12] with a fixed radius was used to find symbol candidates. In [4J, a priori information 
obtained from the channel decoder was exploited to improve the search efficiency of the SDA. In 
[5], a hard sphere decoder was employed to find a single maximum a posteriori probability (MAP) 
symbol estimate maximizing P(x|yo) and a candidate list was generated by flipping bits in the 
MAP estimate. In , the APPs of all bits in x are obtained simultaneously by modifying a bound 
tightening rule of a single sphere search. Additionally, a more sophisticated extension of this idea was 
introduced in [lOj. The computational complexity of these tree detection algorithms varies depending 
on the channel and noise realizations, and in the worst-case the search complexity is the same as 
that of exhaustive search. In order to limit the worst case complexity of the tree detection approach, 
fixed- complexity tree search techniques [50] have been proposed. For example, an M-algorithm was 
extended to soft-input soft-output detection in [llj and an intelligent candidate adding algorithm for 
improving efficiency of the M-algoritlim was proposed in [T2] . The stack algorithm was also exploited 
for list generation in combination with soft augmentation of tail bits of stack elements [13]. Other 
fixed-complexity soft-input soft-output detection algorithms can be found in ^ri]-|16j. 

The M-algorithm [TI], [21], also known as K-hest algorithm in the MIMO detection literature 
[22j . |23j, selects only a finite set of the M best candidates for each layer of the detection tree. The 
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M-algorithm is a practical candidate for soft-input soft-output detection due to its inherent nature 
to facilitate parallel and pipelined processing [23j. In spite of this benefit, the M-algorithm suffers 
from a poor performance-complexity trade-off due to the greedy nature of the algorithm. To be 
specific, the algorithm checks the validity of paths in the forward direction and never traverses back 
for reconsideration. Once the correct path is rejected, it will never be selected again in subsequent 
selections, resulting in wasteful search effort. Moreover, these erroneous decisions often occur in early 
candidate selection stages where the accumulated path metric considers only a few symbol spans. One 
way to alleviate such error propagation is symbol detection ordering [T2], [2^. By processing each 
layer in an appropriate order, the chances of errors propagating to the next stage can be reduced. 
Nevertheless, error propagation severely limits the performance of the M-algorithm especially when 
the system size is large. 

In this paper, we pursue an improvement of the performance-complexity trade-off of soft-input soft- 
output Af -algorithms. Towards this end, we propose a new path metric capturing the contribution 
of the entire symbol path. While the conventional path metric accounts for the contributions of 
symbols along the visited path only, the new path metric looks ahead to the unvisited paths and 
estimates their contributions through a soft unconstrained linear symbol estimate. In fact, a bias term 
reflecting the information from as- yet undecided symbols is incorporated into the conventional path 
metric for this purpose. In order to distinguish this improved path metric from the conventional path 
metric and other look-ahead metrics, we henceforth refer to it as a linear estimate-based look-ahead 
(LE-LA) path metric. We apply the LE-LA path metric to the soft-input soft-output Af -algorithm, 
introducing an improved soft-in soft-out M-algorithm (ISS-MA). By sorting paths based on the LE- 
LA path metric, the ISS-MA lessens the chance of rejecting the correct path from the candidate list 
and eventually improves the detection performance especially for systems of large dimension. Indeed, 
from an analysis of the probability of correct path loss (GPL), we show that the LE-LA path metric 
benefits the candidate selection process of the M-algorithm. 

The idea of using a look-ahead path metric has been explored in artificial intelligence search 
problems [25j and can also be found in soft decoding of linear block codes [56], [57]. In [28j . 
computationally efficient methods to obtain the bias term were investigated using semi-definite 
programming and H°° estimation techniques. While these approaches search for a deterministic 
bias term (lower-bound of future cost) to guarantee the optimality of the sequential or depth-first 
search, our approach uses linear estimation to derive a bias term designed to improve candidate 
selection of the breadth-first search. The key advantage of using a linear estimator is that a priori 
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information can be easily incorporated into the bias term so that the look-ahead operation benefits 
from the decoder output in each iteration. It is also worth emphasizing the difference between the 
proposed path metric and Fano matric j22]- The Fano metric exploits the a posteriori probability 
of each path as its path metric. For a binary symmetric channel, the Fano metric introduces a 
bias term proportional to the path length to penalize paths of short length. The extension of the 
Fano metric to channels with memory or MIMO channels is not straightforward, since it involves 
marginalization over the distribution of the undecided symbols. Modification of the Fano metric is 
considered for equalization of intersymbol interference (ISI) channels in [30J and for multi-input multi- 
output detection in [TTj. As a means to improve path metric of the SDA, the idea of probabilistic 
pruning was introduced in In [IS]) the probability density of an observed signal estimated 

from a separate tree search is used as a bias term. While these approaches assign an equal bias term 
for paths of the same length, the ISS-MA provides a distinct bias term for each path in the tree, 
allowing for the application of a breadth-first search such as the A/-algorithm. As such, our path 
metric can be readily combined with any type of tree-based soft-input soft-output detector. 

The rest of this paper is organized as follows. In Section [TTl we briefly review the IDD system 
and the tree detection algorithm. In Section [Till we present the LE-LA path metric along with its 
efficient computation. We also describe the application of the LE-LA path metric to the soft-input 
soft-output M-algorithm. In Section IIV| we present the performance analysis of the ISS-MA. In 
Section |Vl we provide simulation results and conclude in Section IVTl 

We briefly summarize the notation used in this paper. Uppercase and lowercase letters written in 
boldface denote matrices and vectors, respectively. The superscripts (•)"^ and (•)^ denote transpose 
and conjugate transpose (liermitian operator), respectively. || • |p denotes an L2-norm square of a 
vector and diag(-) is a diagonal matrix that has elements on the main diagonal. OmxJV and ImxN 
are M x N matrix whose entries are all ones or zeros, respectively. The subscript is omitted if there 
is no risk of confusion. CAf{m, ct^) denotes a circular symmetric complex Gaussian density with 
mean m and variance a^. E^l] denotes expectation over the random variable x. Cov(x, y) denotes 
E [xy^] — £'[x]i?[y^]. For a hermitian matrix A, A ^ (or A 0) means that A is semi-positive 
definite (or positive definite). Pr{A) means probability of the event A. fxi,x2,-- ,x„icLi, a2, ■ ' ' jC-n) 
denotes a joint probability density function (PDF) for the random variables xi,X2, • • • , x„. 
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Fig. 1. Block diagram of the IDD system. 



II. Problem Description 

In this section, we briefly review the IDD framework and then introduce the tree detection 
algorithms. 

A. Iterative Detection and Decoding (IDD) 

In a transmitter, a rate Rc channel encoder is used to convert a sequence of independent identically 
distributed (i.i.d.) binary information bits {bi} to an encoded sequence {q}. The bit sequence {q} 
is permuted using a random interleaver and then mapped into a symbol vector using a 2'3-ary 
quadrature amplitude modulation (QAM) symbol alphabet. We label the interleaved bits associated 
with the kth symbol Xk by Ck,i, ■ ■ ■ ^Ck,Q- Due to the interleaver, we assume that these interleaved 
bits are mutually uncorrelated. 

In the system model ([1]), yo and iIq are the L x 1 received signal and noise vectors, respectively. 
Each entry of the x 1 symbol vector x is drawn from a finite alphabet 

r , ^ -2Q/^ + 1 -2Q/^ + 3 2Q/^ - 3 2^/^ - 1 ^ ^ 

where P is chosen to satisfy the normalization condition E [jx^p] = 1. For example, P = \/T0 for 
16-QAM and P = ^42 for 64-QAM modulation, respectively. 

Fig. [T] depicts the basic structure of an IDD system. The receiver consists of two main blocks; the 
APP detector and the channel decoder. The APP detector generates the a posteriori log-likelihood 
ratio (LLR) of c^^i using the observation jo and a priori information delivered from the channel 
decoder. The a posteriori LLR is defined as 

T f- \ : P^i^k,i = +l|yo) 

impost Cfc.i = In — = — r, 3 

Pr{ckA = -1 Yo) 
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where we take c^^i £ { — 1, 1} rather than {0, 1} by convention. With the standard noise model 
no ~ CM {0, all), daD can be rewritten ^ 

ExGX+i exp {^p (x)) 

^post {ck,i) = In '- (4) 

I^xGxr^exp (x)) 



where 



-, N Q 

V'(x) = - — ||y„ - Hxf + E E > 



i=i j=i 

Fr(c,,)=i(l + c,,tanh(^H^z))). (5) 

The set Xj^^l is the set of all configurations of the vector x satisfying c^^i = +1 {Xj^l is defined 
similarly), and Lpri(cfc^i) is the a priori LLR defined as Lp^i{ci^ i) = In Pr{ci^ i = +1)— In Pr{cf^ i = — 1). 
Once Lpost{ck,i) is computed, the extrinsic LLR is obtained from Lext{ck.i) = Lpost{ck.i) — ipri(cfei)- 
These extrinsic LLRs are de-interleaved and then delivered to the channel decoder. The channel 
decoder computes the extrinsic LLR for the coded bits {q} and feeds them back to the APP detector. 
These operations are repeated until a suitably chosen convergence criterion is achieved 



B. Soft-input Soft-output Tree Detection 

The direct computation of the a posteriori LLR in involves marginalization over 2^'3 symbol 
candidates, which easily becomes infeasible for large systems employing high order modulations. 
A tree detection algorithm addresses this problem by searching a small set of promising symbol 
candidates over which a posteriori LLRs are estimated. Specifically, a small number of symbol vectors 
with large ijj{:>c), equivalently, small — (J^Vj(x), are sought. In the sequel, we refer to (iAPp(x) = 
— c7^?/'(x) as a cost metric for tree detection, where — is a scaling factor. The goal of the tree 
detection algorithm is to find symbol vectors of small cost metric, and the best (minimum) among 
them corresponds to the maximum a posteriori (MAP) solution (denoted by xmap)- 

The tree detection algorithm relies on a tree representation of the search space spanned by x = 
(xi, • • • , xtv) G ■ Tree construction is performed from the root node as follows. First, representing 
the symbol realization for x^r, we extend 2'*^ branches from the root (recall that we assume 2'3-ary 
QAM modulation). For each such branch, 2'^ child branches are extended for the possible realization 
of the next symbol xtv-i- These branch extensions are repeated until all branches corresponding to 
xat, • • • ,xi are generated. This yields a tree of the depth A^, where each "complete" path from 
the root to a leaf corresponds to a realization of x. In order to find the complete paths of small 
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cost metric, the tree detection algorithm searches the tree using a systematic node visiting rule. For 
notational simplicity, we henceforth denote a path associated with a set of symbols Xi, - ■ ■ ,Xj,{i < j) 



by a column vector xj 



Also, we call a level of tree associated with the symbol "the 



ith level" (e.g., the bottom level associated with xi is the first level). For details on tree construction, 
see Hn]. 

For a systematic search of symbol candidates, a path metric is assigned to each path xf . Towards 
this end, we perform a QR decomposition of H as 

(6) 

where R has an x upper-triangular matrix whose diagonals are non- negative and Q is an L x A 
matrix satisfying Q^Q = I. Using the invariance of the norm to unitary transformations, we can 
define the cost metric (iApp(x) as 





R 




R 


H = Q 




= [Qi Q2] 















N Q 

dAPp(x) = -cr^V^x) = ||y - R-xf - o-^ 51 XI (^'.i) + ^ 

i=l j=l 

N 



(7) 
(8) 



where 6(x: 



i=l 



0"nEiiilnPr (cfc,i), and y = [yi, ■ 



,yN\ 



Qf yo and C 



iQ^yoll^. The path metric associated with the path x^ can be defined as a partial sum in the cost 



metric [T^ 



7 



(c) 



AT 

(xf)=EH-f)- 



(9) 



i=k 



Whenever a new node is visited, the term fo(xf'), referred to as a branch metric, is added to the 
path metric of the parent node. Since the branch metric is non-negative for all i, the path metric 
^(c) (x^) becomes a lower bound of the cost metric (iAPp(x). Using 7^'^) {^k)i the tree detection 
algorithm compares the reliability of distinct paths and chooses the surviving paths. Since the path 
metric is determined by the visited path, we henceforth denote 7^'^^ i^k) ^ causal path metric. 

According to a predefined node visiting rule \T7\ , the tree detection algorithm attempts to find the 
complete paths associated with smallest cost metric. Denoting the set of the corresponding symbol 
candidates as C, an approximate APP can be expressed as 

I^xe£nx+i exp(^ (x)) 



L 



post (.Cfc^iJ 



In- 



xeCnx 



-1 exp {il; (x))' 



(10) 
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Further simphfication can be achieved using max-log approximation [7], 

-^post(cfc,i) ~ max ^ (x) - max (x) . (11) 

Since C does not span whole symbol space, either C Pi X^^ or £ Pi Xj^^ might be empty for some 
values of k. If this case happens, the magnitude of Lpost{ck,i) is set to infinity, causing a bias in LLR 
values. One way to cope with this event is to clip the magnitude of Lpost{ck,i) to a constant value 
(e.g., ±8) [3]. 

III. Improved Soft-input Soft-output M-algorithm (ISS-MA) 

In this section, we present the ISS-MA, which improves candidate selection process of the soft-input 
soft-output tree detection algorithms. We first describe a genie-aided path metric that motivates our 
work and then introduce the new path metric that accounts for the information on unvisited paths. 
We also discuss an efficient way to compute the new path metric. 

A. Motivation 

We begin our discussion with the following path metric: 
Definition 3.1: A genie-aided path metric 7^^^ {^k) is defined as 

7(^Hxf)=7(^)(xf)+minfx:6(xf)). (12) 

''i \i=l / 



bias term 

The genie-aided path metric is obtained by minimizing the sum of b{'Kf){l < i < k — 1) over all 
combinations of undecided symbols xj'"^. This minimal term, which can be considered as a bias term, 
is added into the causal path metric. The genie- aided path metric can be used in the M-algorithm so 
that the best M candidates with the smallest genie-aided path metric are selected in each tree level. 
It is easy to show that the Af -algorithm employing the genie-aided path metric finds the closest 
(best) path with probability one (even for M = 1). This can be readily shown since the genie-aided 
path metric provides the smallest cost metric among all tail paths. Note that a similar path metric 
appeared in 

Theorem 3.2: Given the actual transmitted symbol vector x^ (i-e., x^ = x^), the bias term of 
the genie-aided path metric is 



k-l \ k-l 

mml 2^0^xvj I =2^01 x,^ 

^1 \i=l / i=l 



(13) 
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where the minimizer ^ is the MAP estimate of xj^ ^, i.e. 



-AT 



(14) 



^ = arg max In Pr (x.i ^ 

Proof: See Appendix [Al ■ 
Theorem. 13.21 implies that the bias term of the genie-aided path metric is obtained by computing 
J2i=i ^ i^f) using the MAP estimate of This MAP estimate is derived under the condition 

that the path associated with the actual transmitted symbols, x^' is given. Though the genie-aided 
path metric offers a substantial performance gain, it is impractical to incorporate it into tree search 
due to the high complexity associated with the MAP estimation. 



B. Derivation of Linear Estimate-Based Look-Ahead (LE-LA) Path Metric 

In order to alleviate the complexity associated with MAP detection of xj-i in the genie-aided 
path metric, we relax the finite alphabet constraint of xj^~^ and then replace the MAP estimate by 
the linear MMSE estimate x^~^. Note that when x^-i is assumed to be Gaussian, the MAP estimate 
is identical to the linear MMSE estimate [32] ■ For a particular path visited x^, we first define the 
LE-LA path metric 

Definition 3.3: The linear estimate-based look- ahead path metric^ denoted by 7(') (xf), is defined 

as 



k-l 



7 



(0 



X 



N 



X 



N 



X 



TV 



i=l 



(15) 



bias term, 7('')(xf) 



where yi\ ^ is the linear MMSE estimate of x^ 



fc-i 



Note that xj-i is obtained under the condition that x;^ = x^. In the sequel, we denote this bias 
term as 7^''^ {^k)- 

To derive the linear MMSE estimate x^~"^, we partition the vectors y and n(= Qf^rio) to (/c— 1) x 1 
and {N — A; + 1) x 1 vectors, i.e., 



(16) 



where Rn^fc, Ri2,fc, and R22,fc are the adequately partitioned sub-matrices of R. Using (jl6p . 7^'^ {^k) 







Rii.fc Ri2,fc 
















+ 


_yf _ 




R22,fc 




x^ 

■^k 







can be expressed as 



7« (xf)=7(^) (xf)+7(^) (xf) 



(17) 
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where 



fc-i 



1^12, fcX^ 



(18) 
(19) 



and ^ (x^ 



X^i^fc li^^^ ("^jjO- Note that the term generated by a priori information 



^(x^) considers only x;^ since the symbols x^ ^ are undecided. Note also that the linear MMSE 



estimate of the non-causal symbols xj^ ^ is given by 



-fc-i 



Fkiyt' 



E 



x^ 



-TV 
X/c 



E 



k-l 



Xi 



X^ 
Xfc 



-TV 
Xfc 



R-ll,fcx5^ ^ — Rl2,fcX^) + xj ^, 



(20) 
(21) 



where xj"^ = £^[x^-^] and = Cov(x^-\ y^-^|xf = ) Cov-^(yJ->f = ). We can obtain 



x^ and Ffc from a priori LLRs as |2] 



-^pri(ci,j) 
2 



E9ee^n?=ii(l + cijtanh( 

Eeee^nf=i^(l + Cfc-ijtanh( 
A/j(Rii,fe)"^ (^(Rii^fc) Afc(Rii_fc)"^ + cr^ 



(22) 



(23) 



where = diag(Ai,-- - ,Afc_i) and = ^^^^ |0 - aJ^p Jl^^i ^(1 + Q., tanh(^2if^)). The set 6 
includes all possible constellation points. In the first iteration of the IDD where a priori LLRs, 



0. 



-^pri(Q,g) are unavailable, = I and Xj^^ 

Using p9|) and ([2T]) . 7*^'') (x;^) can be rewritten 

M ( - n- Rn.fcFfc) (y^-i - Rn,fcxJ-i - Ri2.fcX^ 



7- (xf) = ||(I- 



Zfc (^yi ""^ — Rii,fcxf ^ — R 



12,fcX;, 



(24) 
(25) 



where 



-AJR 



(26) 
(27) 



Further, denoting qfc = Zfc(y]^ — Rn^^x^ ) and = Z/jRi2,fc, 7*-'^ (x^) can be simply expressed 

2 



7 



(0 



(xf) 



7"" I + 



N 



Cik - PfcXfe 



(28) 



bias term 
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Note that the bias term ||qfc — PfeX^ IP of the LE-LA path metric can be computed with only linear 
operations. Note also that a priori information obtained from the channel decoder is reflected through 
x^~^ and Afc in the bias term. 



C. Efficient Computation of Path Metric 

In this subsection, we discuss how the LE-LA path metric can be computed efficiently. Recalling 
that the bias term is expressed as ||Zfc(y^~"'^ — Rn^^xJ"^ — Ri2,feX^)|p, computation of the path 
metric is divided into two steps; 1) computation of for all k prior to the tree search and 2) 
recursive update of the path metric for each branch extension during the search. 

First, using a matrix inversion formula for block matrices [551 Appendix 1.1.3], the operators 
Zk{k = 1, • • • , A^) in P?]) can be computed recursively. Denoting 



R-ii,fc+i — 


R-ll,fe Tfc+i 




Afc 





, Afc+i — 








rk+i,k+i 










(29) 



and Tfc+i = [ri^k+i, ■ ■ ■ 
where 

In particular, Z2 



,i'k,k+i]^, then Zfc+i is expressed as a function of Z^ as 

Zfc — A'A/j+iZfcFfe+ir^-i^Zfc — K Xk+irk+i^k+i'^k^k+i 
—K Xk+irk+i,k+iT^k+i'Zk K (Afc+irf'^^Zferfc+i + cr^) 



K 



Xk 



fc+i l^fe+i 



, iZfcrfe+i + r 



fe+i,fc+i 



(30) 



(31) 



■ Appendix [B] for the derivation of (|3U|) . If the a priori LLRs are all 
zero, Zfc does not need to be computed for every symbol as long as the channel remains constant. If 
the a priori LLRs are non-zero, these steps are performed for each symbol. However, the required 
computations can be further reduced by replacing the instantaneous covariance matrix A^ by its 
time-average over a coherent time [Ij. 

Next, the LE-LA path metric can be recursively updated for each tree extension. At the root node, 
a vector is defined such that ajv+i = y — Rx(^. The vector a/j is updated from that of its parent 
node as 



Vk 



^k+l 



ri,k 



rk,k 



[Xk - Xk), 



(32) 
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where Vk is a scaler variable. Using the vector for each path x^, the LE-LA path metric can be 
obtained as 



7 



(0 



X 



TV 



7 



(^) (xf)+7(^) (x. 



^(0 (xf)=^(^) (xf^,)+kr + e(x.) 



(xf) = ||Zfe • afcl 



(33) 
(34) 
(35) 



where 7^'^^ (xjy^-^) = 0. Noting that the dimension of the matrix is (A; — 1) x (A; — 1), the number 
of complex multiplications for the bias term computation is proportional to {k — 1)'^. In order to 
reduce the complexity, we can look ahead only Ni{< k—l) symbols instead of all non-causal symbols. 
Towards this goal, we set a = niax(0, k — Ni) and repartition the system as 



X 



k-1 



X 



N 



+ 



n 



k-1 



n 



N 



(36) 



R-ii,fc R-i2,fc 

^22,k 

where Rn^fc and Ri2,fc are the redefined sub-matrices of (|16|) . respectively. In this case, the bias term 
defined in Section [III-BI needs to be modified based on this partitioning. In doing so, the dimension 
of Rii^fe and is reduced from (k — 1) x (k — 1) to Ni x A*";. The recursive computation of 
employing the new partitioning can be derived without matrix inversion (see |37i Section III. A]). 
In addition, in pS]) . we only need to multiply Z^ with the last a elements of a^. Overall, by using 
only A*"; non-causal symbols for the bias term, the number of operations for the bias computation 
can be reduced from MJ2k=iik - 1)^(= M/6 • {2N^ - 3N^ + N)) to MN ■ Nf. 



D. Application to APP Detection 

In this section, we introduce the soft-input soft-output tree detection algorithm employing the LE- 
LA path metric. To reduce errors in early detection stages, symbol detection ordering is performed 
first. The V-BLAST ordering [24j or B-Cliase preprocessing [12] can be adopted. Note that the B- 
Chase preprocessing is preferred when M is larger than the constellation size 2*5. In each level of the 
tree, 7^'^ i^k) survival paths are compared and then the M best paths are selected. Starting 

from the root node, this candidate selection procedure continues to the bottom level, eventually 
producing 2'^M complete paths. The symbol vectors corresponding to these complete paths generate 
a candidate list £, over which the extrinsic LLR for each bit is calculated. In the event that a 
particular bit in each of the candidates takes the same value (all one or zero), the magnitude of the 
generated LLR might become unduly large, limiting the error-correction capability of the channel 
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TABLE I 



Summary of ISS-MA 



Output: {Lpost(cfc,i)}fc=[i:^r],i^[i,Q] 

Input: y, H, {Lpri(cfe.i)}fc=[i:jv],i=[i:Q]> Ni and J 



STEP 1: (Preprocessing) Order x and H according to V-BLAST ordering [21] or B-Chase preprocessing 
jl2j . Then, compute Zfc for all k. 

STEP 2: (Initialization) Initialize i = N + 1 and start the tree search from the root node. 

STEP 3: (Loop) Extend 2^ branches for each of M paths that have survived at the (i + l)th level. This 

generates 2*5 M paths at the ith level. 

STEP 4: If j > 1, choose the M best paths with the smallest 7''' (xf ) and go to STEP 3 with i ^ i - 1. 
Otherwise, store all survival candidates into the list £ and go to STEP 5. 

STEP 5: (List extension & APP calculation) For each value of k and i, compute {Lpost{ck,i)} based on 
£. If the value of Ck,i for all elements of £ is either +1 or -1, the value of Ck^i of the best J candidates 
(associated with the minimum cost metric) is flipped and these counter-hypothesis candidates are added 
to £ to generate the extended list £™*. The APP is calculated over the extended list based on H37[l . 



decoder [38j . In order to prevent this situation, whenever this occurs for the A;th bit of the candidate 
Hst, the A;th bits of the best J candidates (J < are flipped and added into the candidate hst 

C, generating an extended list vC|^*. As a result, the size of Cf^^ becomes more desirable 

flipping method would be to flip the corresponding bit of all candidates and then select J of them 
which would have the lowest cost function. Since this method increases the complexity considerably, 
we employ the alternative that flips the best J candidates. Though the current approach would 
produce a slightly degraded counter- hypothesis set, we hope that it is highly likely to be, or to at 
least have a large overlap with, the aforementioned best counter-hypothesis set. Using the list 
together with the max-log approximation, the APP becomes 



A summary of the ISS-MA is provided in Table [H 

IV. Performance Analysis 

We discussed in the previous section that the transmitted symbols are always found with M = 1 
if the genie-aided path metric is used. Relaxation of the finite alphabet constraint and Gaussian 




max ijj (x) 




(37) 
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approximation are made for undecided symbols to derive the LE-LA path metric. In this section, 
we analyze the performance of the proposed Af -algorithm employing the LE-LA path metric. As a 
measure for performance, we consider the probability of a CPL event, i.e., the probability that the 
tree search rejects a path associated with the transmitted symbols. In order to make the analysis 
tractable, we focus on the case when M = 1. Although our analysis focuses only on the case M = 1, it 
is clear that lower CPL rate for M = 1 implies a greater likelihood of a correct symbol being selected 
for M > 1 as well. The performance analysis for M > 1 is presented via computer simulations in 
Section |V^ 

Given the channel matrix R and the a priori LLRs, the probability of CPL can be expressed as 



-PcPL = 1 — Pr (x G £|x is sent) 



N 

k=l 
N 



X 



k+l 



G -Cfc+i 



k=l 



(38) 
(39) 

(40) 



where Ck denotes the set of the paths selected at the kth. level and Pr(-) is the probability given 
that X is sent. Since we consider the case of M = 1, xj^+i ^ ^fc+i implies that a correct path has 
been selected up to the {k + l)th level. With this setup and from (|16|) . (|18|) . and (|25|) . one can show 
that 7^') (x^) is given by 



7 



(0 



R 



Xk 



X 



N 
k+l 



-Fe(2:fc)+e(xf+i) 



-I- 





Xk 




Qfc — Pfc 






~N 
^fc+1 





(41) 



7(^)(xf) 



^(.)(x^) 



\rk,k [Xk - Xk 



i=k+l 



'^k^k 

rk,k 



+ nkf+ Wil'^ + W^k^k {xk - Xk) + Zkhklf + ^ (xk) + C [^k+i) 

N 

+ axk)+ E + (42) 



[Xk - Xk) + 



'^k^k 

nk 



rf ZfcFfc + IrtfcP (xfc - Xk) 



i=k+l 

Zfcr/£ rkh 



rf Zferfc + \rk,k\ 



'^k^k 

nk 



+ ^Xk) + C (43) 



where bj, = Rn^fc (^x^ - x^' j + , and = Ri2,feei = [n^fc,--- ,rk-i^k] ■ Note that C is 
independent of the selection of Xk- The first term in (j^3|) can be interpreted as the distance metric 
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between the output of a scalar additive noise channel 



Hrj , \ |2~ , [^kYk rk 

rVZkTk + Irkhrxk H , 



Zfcbfc)^ nk " (44) 



T 



and a symbol candidate yr^^Z^rfc + l^fc^fcpXfc. The ISS-MA chooses the M best symbols Xk according 
to the cost metric in (|43p . Since the a priori term £,{xk) in (|43|) leads to better detection, we ignore the 
impact of it in our discussion. If we let E [hkhf] = Sfc = ^Rn^/jAfeR^ + cr^I^ and Zfc = ct^S^"'^, 
then the signal to interference plus noise ratio (SINR) of the scalar additive noise channel is given 

by 

rj^ZlE[hkhf]Zlrk + al\rk,k\' 

(46) 



1 








\^k,k\ j 








) rfe + 





Lemma 4-1-- The SINR in (|46p is bounded by 

^Irf^frk + ^ < SINR < rf S.-^r, + (47) 

Proof: See Appendix O ■ 
Taking similar steps, one can show that the SINR for the causal path metric 7'-'^^ {'^k) 1imL_ 
Hence, r|^S^^r|^ and a^r^Yl'^'^r^ can be regarded as upper and lower bounds on the SINR gain 
achieved by the LE-LA path metric, respectively. It is of interest to check the behavior of the upper 
and lower bound of SINR gain for high dimensional systems. Suppose that N, L ^ 00 with a fixed 
aspect ratio (3 = N/L (0 < /3 < 1), and let Amin and A^ax be smallest and largest diagonals of A^, 
respectively. Then, we attain a looser bound on the SINR is 

I |2 I |2 

(j^rf ((7^1 + An,axRii.fcRn fc)-'rfe < SINR < rf (a^I + A„,inRii,fcRn k)~'^k +^^, (48) 



flower ^upper 

where the upper and lower bound of the SINR gain are denoted as B'^^^^'^ and B]^""^^, respectively. 
Note that P8|) can be shown by the relationship B ^ S/^ ^ A (equivalently, ^ and 

A"2 ^ Sfe^)> where A = cr^I + AmaxRii,feRn,fc and B = a^I + AnimRii,fcRn,A; and A ^ implies 
that the matrix X is positive semi-definite. 

Theorem For an L x A matrix H whose elements are i.i.d. random variables with zero mean 
and variance the upper and lower bound of the SINR gain for the level k = 7A +1 (0 < 7 < 1) 
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converge to 



^upper 



olower 



oupper,oo 



7-> lower, oo 




n\ ^min 



+ G 



(Tt, 



-,7/3 



-(1-7/3) + 



1 + 7/3 + (1 - 7/3) 



2 An 



g(^,7/3; 



(49) 
(50) 



as /V, L ^ oo with /3 = /V/L, where G(x, 6) = + 2{l + h)x + {I - h)'^x^ . 
Proof: See Appendix [Pj 
Corollary 4-3: As ct^ — )• 0, we have 



^k 



B 



lower, CO 
k 



Air 
0. 



7/3 



'(1-7/3) 



(51) 
(52) 



0, ^^PP™>°° monotonicany increases and approaches An 



7/3 



We can deduce from ([5T|) and ([52]) that the actual SINR gain approaches a deterministic value 
between [0, Amin (iZ^^) ]- One can also show that both ^^pp^''-^^ ^nd are an increasing 

functions of 7/3 E (0, 1). Noting that 7 indicates an index for tree depth, the SINR bounds achieve 
their maximum at the top level of the tree {k = N). 

Next, we analyze the probability of CPL using the SINR obtained. It is worth taking a close look at 



the term Z/jb^ in (jH]). Recalling that 



xJ-M+n^-^ and Zfc = {Rn,kAk{Rn,k)'' + 



Z/jb/j is an MMSE estimate of n^~"^ In order to make the derivation of the CPL probability 
more tractable, we use a Gaussian approximation for the MMSE estimation error (n^ — Z^b^) or 
equivalently, Zfcbfc. Under this approximation, we can assume that the interference plus noise of 
the scalar channel is Gaussian. The validity of this approximation has been supported in many 
asymptotic scenarios in [39] and [30]. In particular, it is shown that the Gaussian approximation is 
highly accurate for large problem size N |41j . 

Using the SINR in (|36l) . the probability of CPL for the A;th level detection can be expressed as 



Xf+i G A+i,H) <4( 1 



Q 



1 [^k [<^f)rk + M^ 



) 



Pr (Xf ^ Ck\ Xf+i G Ck+i, h) < 4 (1 - ^) qLk Urj^^fvk + 



1~kM 



(53) 



where K = (^2<^f_ij ■ The inequality in (|53p follows from the existence of a priori terms in (|43|) , which 
lowers the actual CPL probability. From (|47|) . we have 



(54) 
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Using (|54p , we can analyze an average probability of CPL for a random channel H whose elements 
are independent complex Gaussian with C7V(0, 1). The average probability of CPL, denoted as -Pcpl, 
is given by 



^CPL —1 — Eu 

N 



^k+i e £fc+i,H 



fc=i 

^ \ ^ -^fe+ii H^] + higher order terms, 



k=l 



(55) 
(56) 



where Eii[-] denotes the expectation over H. The average CPL probability is obtained after evaluating 
Ell 4- ^fe|5f^i G £fc+i,H)] for all k. In our analysis, we do not put our emphasis on the 

higher order terms since they become negligible in the high SNR regime. Using the relationship 



Q{\/x + y) < Q{\/x) exp (— I) for x,y > and from we have 
< 4 ( 1 ^ ^ 



4 1 



20 
1 



E- 



H 



Q{\ K 



1~kM 



Eh 



Q \ K 



Ffc.fcl 



exp 



Eh 



exp 



(57) 
(58) 

(59) 



where (f59l) follows from independence of r^^k and r^. Noting that rk^k has a Clii-square distribution 
with 2(L — k+1) degrees of freedom and has independent complex Gaussian elements Lemma 
2.1], we have |44J, 

L-k+l 



Eu 



-^\rkM 



K 



1 1 



2 2V K + 2al 



E 

1=0 



'L-k + V 



I 1 K 



(60) 



Lemma 4-4- An upper bound on the scaling gain in (i59|l is given by 
exp 



£^1 



H 





roc 




)1 













1 



x fvi,-,vk-i i^ir-- -.Xk-ijdxi - ■ ■ dxk-i, 



where 



fc-i \ fc-i 



Proof: See Appendix [El 



(fc-1) 



exp 



^L-fc+l fc-1 



(fc-l-i)!(L-i)l 



- n - 

i<j 



(61) 
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While exp 



^^^-'^ — - — - ' in (|61|) tends to one as — )• 0, (|6Up decreases to zero with a slope 



lim£r2_^o lii(^e) / lii('^n) = L — k + 1. Therefore, at high SNR, the probability of CPL for the top level 
(A; = A^) would dominate, i.e., 

1 



^CPL ^41 



H 



Q\\ K 



\rN,N\ 



H 



exp 



(62) 



where the right-hand side is obtained from (|60|) and (|6T|) . Following similar steps, we can also show 
that the upper bound of the average CPL probability for the causal path metric becomes 

1 



^causal ^ - / 

^CPL ^4(1 



E^ 



H 



Q\\K 



\rN,N\ 



(63) 



We observe from (|62p that the average CPL probability of the LE-LA path metric is smaller than 
that of the causal path metric by the factor of i?H exp -ycr^r^S^^rAr^ . Since this term is strictly 
less than unity, it corresponds to the scaling gain obtained from the LE-LA path metric. 

In Fig. [21 we provide the plot of the average CPL probability versus SNR for several system 
sizes (A^ = 5, 10, 15, and 20). We assume uncoded QPSK transmission. The average CPL rate and 
its upper bound are obtained from (j53p and ([5^ . For a comprehensive view, we also include the 
average CPL rate for the causal path metric in (|63p . For all cases considered, the CPL expression 
in (|53p is quite close to that obtained from the simulation results, supporting the accuracy of the 
analytic bound we obtained. In particular, the upper bound of the average CPL rate appears tight 
at high SNR. Fig. [3] shows how the scaling gain in (|6ip varies as a function of SNR and system 
size. We observe that the performance gain of the LE-LA path metric improves with system size 
and the maximum is achieved in low to moderate SNR range (10 dB ~ 20 dB). Notice that this 
behavior is desirable for IDD, since the performance in low-to- mid SNR range is critical in triggering 
performance improvement though iterations 



V. Simulation and Discussion 

In this section, we evaluate the performance of the ISS-MA through computer simulations. First, 
we observe the performance of the soft-input soft-output A/-algorithni employing the LE-LA path 
metric and that employing the conventional path metric. Note that the LE-LA path metric is not 
restricted to a particular search scheme and can be extended to more sophisticated breadth-first 
search algorithms (such as [12] and [23]). Next, we compare the performance-complexity trade-off of 
the ISS-MA with the existing soft-input soft-output detectors. 
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A. Simulation Setup 

The simulation setup for the IDD system is as follows. A total of 2 x 10^ information bits are 
randomly generated. A rate R = 1/2 recursive systematic convolutional (RSC) code with feedback 
polynomial 1 + D + and feedforward polynomial 1 + . We use a random interleaver of size of 
12, 000 bits. We use a gray mapping for QAM modulation. We assume fast fading channels where 
each entry of H is i.i.d. complex Gaussian CJ\f{0, 1) and perfect knowledge of the channel state at 
the receiver is assumed. For the channel decoding, a max-log-MAP decoder [7] is employed. The 
SNR is defined as SNR = 10 log]^o(^/^n)- Computational complexity of detectors is measured by 
counting the average number of complex multiplications per symbol period and per iteration^ 

B. Simulation Results 

First, we compare the performance of the causal path metric and the LE-LA path metric. We 
consider the 12 x 12 16-QAM MIMO system, which requires high detection complexity. For fair 
comparison, we employ the same candidate extension strategy with J = 16 (described in Section 
IIII-D|) for both algorithms. The parameter Ni is set to 5 for the LE-LA path metric. In Fig HI 
the plots of bit error rate (BER) versus SNR are provided for several M values (M = 4, 6, 8 and 
12). Each plot shows the BER curves obtained after a different number of iterations. The ISS-MA 
outperforms the conventional M-algorithm for all M values and after each iteration. In particular, 
with M = 4, the ISS-MA shows remarkable performance gain (more than 5 dB gain). Then, the 
performance gap decreases as M increases. Note that the ISS-MA maintains strong performance 
even with small M (e.g. M = 4). Table [III provides computational complexity of both algorithms 
along with the SNR required to achieve the BER of 10~^ for the same setup. The SNR is measured 
after the 7th iteration. In order to compare performance-complexity trade-off, it is worth looking 
at the performance of the ISS-MA with M = 4 and the M-algorithm with M = 8, where both 
algorithms require similar computational complexity. In these cases, the ISS-MA achieves almost 1 
dB performance gain. We can additionally observe that the performance of the ISS-MA converges 
faster than the conventional M-algorithm, which might also help reducing the complexity of the 
ISS-MA by the early termination of the iterations. 

^The complexity for QR decomposition and detection ordering is not considered since they are common in all 
detection algorithms under consideration. 
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Next, we take a look at how the performance gap between the ISS-MA and the conventional M- 
algorithm changes in terms of different system size. Table HII] presents the SNR at 10~^ BER and 
complexity of both algorithms for = L = 6, 8, 10, and 12. 16-QAM is used and A/ and M are set 
to 5 and 6 for all cases. The performance is measured after the 7th iteration. We observe that the 
performance gain due to the LE-LA path metric increases with system size. In particular, the gain 
of the ISS-MA for 6 x 6 system is 0.5 dB and that increases to 1.75 dB for the 12 x 12 system. This 
clearly demonstrates that future cost plays a key role for large systems. 

We next investigate the performance of the ISS-MA as a function of the parameter Ni (see Section 
IIII-Cp . In our simulations, the 12 x 12 16-QAM transmission is considered and M is set to 8. Fig. 
(a) and (b) show the performance and complexity of the ISS-MA for different Ni. Note that the 
ISS-MA with Ni = reduces to the conventional M-algorithm. As the parameter Ni increases, the 
ISS-MA accounts for the further future cost so that the computational complexity increases and 
performance improves. The ISS-MA offers performance-complexity trade-off through Ni. While the 
performance of the ISS-MA improves much for small A; values, the effect of Ni diminishes with 
larger Ni. It is shown that the choice of A/ = 5 is sufficient to achieve the maximal performance 
gain offered by the LE-LA path metric for the cases considered. 

We also take a look at the performance of the ISS-MA for spatially correlated MIMO channels. 

1/2 1/2 

We model a correlated MIMO channel as He = Rr • H • i?^ , where Rt is the N x N transmit 
correlation matrix and Rr is the L x L receive correlation matrix. Fig. [6] shows the plot of BER vs. 
SNR of the ISS-MA and the conventional soft-input soft-output A/-algorithm for correlated channels 
with 
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Note that two antennas are less correlated as the space between them increases. The 12 x 12 16-QAM 
system is considered and the parameters M and Ni are set to 12 and 5, respectively. Comparing the 
result shown in Fig. [U] with that in Fig. 3] (d), we observe that the performance of both algorithms 
degrades in correlated channels, but the performance gain of the ISS-MA over the conventional 
algorithm is even larger. From this observation, we can deduce that the LE-LA path metric can be 
more powerful when channel gains are correlated. 
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Finally, we check the performance-complexity trade-off of the ISS-MA along with those of the 
existing soft-input soft-output tree detection algorithms. For a comprehensive picture, we consider 
the following algorithms; 

1) MMSE-PIC algorithm; MMSE parallel interference cancellation detector [Ij, [2]. This detector 
subtracts a priori estimates of the interfering symbols from the received vector and then applies 
a linear MMSE estimator to obtain soft estimate of the symbols. 

2) LISS algorithm (IS*!, 1 5*^:1); List sequential stack algorithm [13j. It is characterized by the size of 
stack 1 5*1 and that of auxiliary stack \Sx\- 

3) LFCSD algorithm {Nc, NsJ] List fixed complexity sphere decoder pH]. A candidate list is found 
by the fixed complexity sphere decoder proposed in [|20J. This detector is characterized by Nc 
and Ns^ , which represent the size of the candidate list and the number of paths fully extended, 
respectively. 

4) ITS algorithm (M); Iterative tree search ^JJ. This detector uses the conventional M-algorithm 
to find the candidate list. 

Note that the parameters of the ISS-MA are remarked in "ISS-MA (M, iV;)". Although the LSD 
[3] and single (parallel) tree search (STS) [9], [10] are considered as powerful detection schemes, 
their complexities grow so rapidly with problem size they are infeasible for the 12 x 12 system. For 
this reason, we only consider fixed-complexity detectors. In Fig. [71 the performance and complexity 
of each algorithm are drawn in the same plot to compare the performance-complexity trade-off of 
the detectors. Due to linear structure, the MMSE-PIC achieves the lowest complexity among all 
candidates. In addition, the performance of the MMSE-PIC is better than that of the LISS, LFCSD, 
and ITS. This would be why the performance of the latter detectors depends on candidate size and 
the size is not large enough to achieve good performance in the 12 x 12 system. In particular, due to 
the limited stack size, the stack memory used in the LISS easily becomes full before reaching a leaf 
of the tree so that the LISS often fails to find reliable candidates. Fig. [7| shows that only the ISS- 
MA can achieve the better performance than the MMSE-PIC. Due to improved candidate selection 
process, the ISS-MA finds reliable candidates only with small candidate size, thereby yielding the 
best BER performance while maintaining reasonable complexity. In conclusion, the ISS-MA achieves 
the best performance-complexity trade-off among all tree detectors considered. In addition, the ISS- 
MA provides performance gains over the MMSE-PIC at the expense of higher, but manageable 
complexity. 
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VI. Conclusions 

In this paper, we discussed a new path metric, which shows great promise in terms of its performance- 
complexity trade-off for soft-input soft-output tree detection in an IDD system. By accounting for 
non-causal symbols in the linear estimate-based look-ahead (LE-LA) path metric, the performance 
gains over the existing causal path metric are achieved. We apply the LE-LA path metric to the 
soft-input soft-output M-algoritlim. By adopting the sorting mechanism exploiting the LE-LA path 
metric, we could improve the chance of selecting the correct path dramatically, thereby achieving 
good detection and decoding performance with fewer iterations. From CPL probability analysis, we 
observed that the LE-LA path metric reflects the reliability of selected paths much better than the 
causal path metric. Computer simulations confirm that the proposed ISS-MA can be a promising 
candidate for soft-input soft-output detection in high dimensional systems. 



Appendix A 
Proof of Theorem 13.21 

The transformed vector y can be expressed as y = Rx + n, where n = Qin,,. Let k be the current 
layer being searched then y, x, and n can be partitioned into two {k — 1) x I and {N — k + 1) x 1 
vectors, i.e.. 
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where the upper triangular matrix R is partitioned into four sub-matrices. Given the transmitted 
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Hence, we can show that 
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(70) 
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where the equation (|73|) follows from the definition of the branch metric. Hence, for = x^, we 
have min,.-i Eti b (xf ) = Et/ b (xf ) 

Appendix B 
Proof of 

We can express Z^+i in (|27p as 

Zfc+i = cr^ Rii^fc+iAfc+i(Rii,fc+i)'^ + (T^I 
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To obtain the update formula, for partitioned matrices, A given by 

All Ai2 
A21 A22 



(74) 
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Appendix 1.1.3] 



we have 
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A22 - A2iAi/Ai2j A2iAii^ 

Let All = f^n (Z/c) ^ + Xk+i'^k+i'^'k+ii ^12 = Afc+irfc+i^fc+iFfc+i, A21 = Afc+irfe+i^fc+ir^^^^, and A22 
+ (T^ , then ([75]) becomes 

Zfc — XAfc+iZfcrfc+ir|{,_]^Zfc — XAfc+irfc+i^fc+iZ^rfc+i 
~K\k+irk+i,k+i'^k+i^k K (Afc+irf^iZfcrfc+i + a^) 
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where 
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Appendix C 
Proof of Lemma 14.11 

Let Sfc be decomposed to U$fcU^, where > 02 ^ ■ ■ ■ ^ are the eigenvalues of S^. Then, 



the upper bound of the SINR is given by 
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Urfc and ([HT]) is from the Cauchy-Schwarz inequahty. 



Next, with A = rj^ (^cr^Sj.^^ rfc + |rfc^fcp and B = (^cj^Sj.^^ rfc + |rfc^fcp, we can show 
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This becomes the lower bound of the SINR. 
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Appendix D 
Proof of Theorem 14.21 

Let Hi;fc_i be a matrix generated from the first k — 1 columns of H. Since Hi:fc_i = Q 



R.ii,fc 




the matrices Rn^fcR^^ ^ and H|^j,_-^Hi;fc_i share same eigenvalues. For an i.i.d. random matrix H, 



January 20, 2013 



DRAFT 



25 



the elements of are zero- mean and independent with variance of According to [43^ Lemma 
2.29], as A^, L — 7- oo with [5 = L/N, iJ^PP™ converges almost surely to 

1 



^upper _ I' ^2 J ^ AminR.ll,fcRn,fc) > ^ 



< Jo 1 + 



A ■ -^V 



fr,{x)dx 



6) 



where fn{x) is an empirical eigenvalue distribution of H^j,_-^Hi./;_i. According to the Marcenko- 
Pastur law [35} Theorem 2.35], as A, L — oo with /? = L/N, fr^{x) converges almost surely to 



Ux) ax) 



27r7/3x 



^7) 



where (x)^ = max(0, z). Hence, from (j86|) and (|87p . we obtain 



^upper 



7/3 



<Jo 1 + 



1 



f°{x)dx 



1 



2A, 



-1- (1-7/3) 



A. 



9 ' 



7/3 



In a similar manner, the lower bound converges to 



B 



lower 
k 



max 



H 



1 + 



2 J J? 



(89) 



^|-(l-7/3) + 



1+7/3 + (1-7/3) 



2 An 



g(^,7/3 



(90) 



Appendix E 
Proof of Lemma 14.41 

Let rji,r]2, - ■ ■ , %-i be the unordered eigenvalues of Rn ;jR^ ^. The scaling gain in 
expressed as 



H 



exp 



K 



2 /iy^-2 
'^rJ^k ^k 



H 



exp ( -— o-^rf (all + A„iaxRii,fcRn,fc) 



E- 



k-l 



K 



n ( -^7^ _^,2 Vi,k? 



H 



^Rii, 



,i=l 



2 (Aniax??i + 0-2)' 



7c- 1 



can be 

(91) 
(92) 

(93) 

(94) 

where (|93p is from E[x] = £'[i?[a;|y]] and ([Ml) follows from r^^fc being C7\A(0, 1) and independent 
of Rii,fc. Let Yli-k-i be a matrix generated from the first k — 1 columns of H, then the matri- 
ces Rii^fcR^ and H^^_^Hi:fc_i share same eigenvalues. The pdf of the unordered eigenvalues of 



i=l 
k-l 

n 



K 



exp 
1 



2 (Aniax??i + (ylY 



R 



i=l 



1 



K 
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Hff^_;^Hi:fc_i is given by 

/ fc-l \ fc-l x^~^^^ 
fvu-,Vk-A^ir-- ,Xk-i)= ,, .x. exp -^xj IItI— r^-TTTT — 7TT H (^^ " ^i)' ' (9^) 



which completes the proof. 



January 20, 2013 



DRAFT 



27 



References 

[I] M. Tiichler, R. Koetter, and A. C. Singer, "Turbo equalization: principles and new results," IEEE Trans. 
Commun., vol. 50, pp. 754-767, May 2002. 

[2] M. Sellathurai and S. Haykin, "Turbo-BLAST for wireless communications: theory and experiments," IEEE Trans. 

Signal Processing, vol. 50, pp. 2538-2546, Oct. 2002. 
[3] B. Hochwald and S. T. Brink, "Achieving near-capacity on a multiple-antenna channel," IEEE Trans. Commun., 

vol. 51, pp. 389-399, March 2003. 
[4] H. Vikalo, B. Hassibi, and T. Kailath, "Iterative decoding for MIMO channels via modified sphere decoding," 

IEEE Trans. Wireless Commun., vol. 3, pp. 2299-2311, Nov. 2004. 
[5] X. Wang and H. V. Poor, "Iterative (turbo) soft interference cancellation and decoding for coded CDMA," IEEE 

Trans. Commun., vol. 47, pp. 1046-1061, July 1999. 
[6] C. Berrou and A. Glavieux "Near optimum error-correcting coding and decoding: Turbo-codes," IEEE Trans. 

Commun., vol. 44, pp 1261-1271, Oct. 1996. 
[7] P. Robertson, P. Hoeher, and E. Villebrun, "Optimal and sub-optimal maximum a posteriori algorithms suitable 

for turbo decoding," European Trans, on Telecommun., vol. 8, pp. 119-125, March 1997. 
[8] R. Wang and G. B. Giannakis, "Approaching MIMO channel capacity with soft detection based on hard sphere 

decoding," IEEE Trans. Commun., vol. 54, pp. 587-590, April 2006. 
[9] J. Jalden and B. Ottersten, "Parallel implementation of a soft output sphere decoder," Proc. IEEE Ashilomar 

Conference on Signals, Systems, and Computers, Nov. 2005, pp. 581-585. 
[10] C. Studer, A. Burg, and H. Bolcskei, "Soft-output sphere decoding: algorithms and VLSI implementation," IEEE 

Journal on Selected Areas in Commun., vol. 26, pp. 290-300, Feb. 2008. 

[II] Y. L. C. de Jong and T. J. Wilink, "Iterative tree search detection for MIMO wireless systems," IEEE Trans. 
Commun., vol. 53, pp. 930-935, June 2005. 

[12] D. L. Milliner, E. Zimmermann, J. R. Barry, and G. Fettweis, "A fixcd-compelxity smart candidate adding 
algorithm for soft output MIMO detection," IEEE Journal of Selected Topics in Signal Processing, vol. 3, pp. 
1016-1025, Dec. 2009. 

[13] J. Hagenauer and C. Kuhn, "The list-sequential (LISS) algorithm and its application," IEEE Trans. Commun., 

vol. 55, pp. 918-928, May 2007. 
[14] L. G. Barbero and T. S. Thompson, "Extending a fixed-complexity sphere decoder to obtain likelihood 

information for turbo-MIMO systems," IEEE trans. Veh. TechnoL, vol. 57, no. 5, pp. 2804-2814, Sep. 2008. 
[15] E. G. Larsson and J. Jaldcn, "Fixed-complexity soft MIMO detection via partial marginalization," IEEE Trans. 

Signal Processing, vol. 56, pp. 3397-3407, Aug. 2008. 
[16] D. Wu, J. Eilert, R. Asghar and D. Liu, "VLSI implementation of a fixed- complexity soft-output MIMO detector 

for high-speed wireless," EURASIP Journal on Wireless Commun. and Networking, vol. 2010, pp. 1-13, 2010. 
[17] A. D. Murugan, H. E. Gamal, M. O. Damen, and G. Caire, "A unified framework for tree search decoding: 

rediscovering the sequential decoder," IEEE Trans. Information Theory, vol. 52, pp. 933-953, March 2006. 
[18] U. Fincke and M. Pohst, "Improved methods for calculating vectors of short length in a lattice, including a 

complexity analysis," Math. Comput., vol. 44, pp. 463-471, April 1985. 
[19] B. Hassibi and H. Vikalo, "On the sphere-decoding algorithm I. Expected complexity," IEEE Trans. Signal 

Processing, vol. 53, pp. 2806-2818, Aug. 2005. 



January 20, 2013 



DRAFT 



28 



[20] J. Jalden, L. G. Babero, B. Ottersten, and J. S. Thompson, "Full diversity detection in MIMO systems with 
a fixed-complexity sphere decoder," IEEE International Conference on Acoustics, Speech, and Signal Processing, 
April 2007, pp. III-49-III-52. 

[21] J. B. Anderson and S. Mohan, "Sequential coding algorithms: A survey and cost analysis," IEEE Trans. 

Commun., vol. COM-32, no. 2, pp. 169-176, Feb. 1984. 
[22] K. Wong, C. Tsui, R. S. Cheng, and W. Mow, "A VLSI architecture of the K-best lattice decoding algorithm 

for MIMO channels," IEEE ISCAS, Scottsdale, AZ, USA, May 2002, pp. 273-276. 
[23] Z. Guo and P. Nilsson, "Algorithm and implementation of the K-best sphere decoding for MIMO detection," 

IEEE Journal on Selected Areas in Commun., vol. 24, pp. 491-503, March 2006. 
[24] P. W. Wolniansky, G. J. Foschini, G. D. Golden, and R. A. Valenzuela, "V-BLAST: an architecture for realizaing 

very high data rates over the rich-scattering wireless channel," Proc. URSI Int. Symp. Signals, Syst., Electron., 

Sep. 1998, pp. 295-300. 

[25] J. N. Nilsson, Principle of Artificial Intelligence. Palo Alto, CA: Tioga Publishing Co., 1980. 

[26] Y. S. Han, C. R. P. Hartmann, and C. Chen, "Efficient priority-first search max;imum-likclihood soft-decision 

decoding of linear block codes," IEEE Trans. Information Theory, vol. 39, pp. 1514-1523, Sep. 1993. 
[27] L. Ekroot and S. DoUnar, "A* decoding of block codes," IEEE Trans. Commun., vol. 44, pp. 1052-1056, Sep. 

1996. 

[28] M. Stojnic, H. Vikalo, and B. Hassibi, "Speeding up the sphere decoder with and SDP inspired lower 

bounds," IEEE Trans. Signal Processing,, vol. 56, pp. 712-726, Feb. 2008. 
[29] R. Johannesson and K. S. Zigangirov, Fundamentals of convolutional coding, Wilcy-IEEE Press, 1999. 
[30] F. Xiong, A. Zcrik, and E. Shwedyk, "Sequcntional sequence estimation for channels with intersymbol interference 

of finite or infinite length," IEEE Trans. Commun., vol. 38, pp. 795-804, June 1990. 
[31] R. Gowaikar and B. Hassibi, "Statistical pruning for near-Maximum Likelihood Decoding," IEEE Trans. Signal 

Processing, vol. 55, pp. 2661-2675, June 2007 
[32] B. Shim and I. Kang, "Sphere decoding with a probabilistic tree pruning," IEEE Thins. Signal Processing, vol. 

56, pp. 4867-4878, Oct. 2008. 
[33] T. Cui, T. Ho and C. Tellambura, "Statistical pruning for near maximum likelihood detection of MIMO systems," 

IEEE International Conference on Commun. (ICC), June 2007, pp. 5462-5467. 
[34] W. Zhao and G. B. Giannakis, "Reduced complexity closest point decoding algorithms for random lattices," 

IEEE Trans. Wireless Commun., vol. 5, pp. 101-111, Jan. 2006. 
[35] H. V. Poor, An Introduction to Signal Detection and Estimation, 2nd Edition, Springer, 1994. 
[36] S. M. Kay, Fundamentals of statistical signal processing: estimation theory. Addison Wesley Longman, 1993. 
[37] M. Tiichler, A. C. Singer, and R. Koetter, "Minimum mean squared error equalization using a priori information," 

IEEE Trans. Signal Processing, vol. 50, pp. 673-683, March 2002. 
[38] D. Milliner, E. Zimmermann, J. R. Barry, G. Fettweis, "Channel state information based LLR clipping in list 

MIMO detection," Proc. IEEE PIMRC, Sept. 2008, pp. 15-18. 
[39] H. V. Poor and S. Verdu, "Probability of error in MMSE multiuser detection," IEEE Trans. Information Theory, 

vol. 43, pp. 858-871, May 1997. 
[40] P. Li, D. Paul, R. Narasimhan, and J. Cioffi, "On the distribution of SINR for the MMSE MIMO receiver and 

performance analysis," IEEE Trans. Information Theory, vol. 52, pp. 271-286, Jan. 2006. 



January 20, 2013 



DRAFT 



29 



[41] D. Guo, S. Verdu, and L. K. Rasmussen, "Asymptotic normality of linear multiuser receiver outputs," IEEE 

Trans. Information Theory, vol. 48, pp. 3080-3095, Dec. 2002. 
[42] J. Proakis, Digital communications: 4th edition. McGraw-Hill, 2001. 

[43] A. M. Tulino and S. Verdu, Random matrix theory and wireless communications. Foundations and Trends in 

Communications and Information Theory, 2004. 
[44] D. Tse and P. Viswanath, Fundamentals of wireless communication. Cambridge University Press, 2005. 
[45] S. T. Brink, "Convergence of iterative decoding," Electron. Lett, vol. 35, pp. 806-808, May 1999. 
[46] A. Edelman, "Eigenvalues and condition numbers of random matrices," Ph. D. thesis. Dept. Math., Massachusetts 

Inst. TechnoL, Cambridge, 1989. 



January 20, 2013 



DRAFT 



30 



5x5 system, QPSK 




— — GPL rate for causal path metric 
-©— Upper-bound (anal. 
-B— CPL rate (anal.) 
X CPL rate (simul. 



10 



15 



20 

SNR 



25 



30 



35 



10x10 system, QPSK 




— — CPL rate for causal path metric 
-©— Upper-bound (anal, 
-B— CPL rate (anal.) 
X CPL rate (simul 



10 



15 



20 

SNR 



25 



35 



(a) 

15x15 system, QPSK 




— — CPL rate for causai path metric 
-G— Upper-bound (anal.) 
-B— CPL rate (anal.) 
X CPL rate (simul.) 



10 



15 



20 

SNR 



25 



B 

tS 10 



CL 

o 

D) 
CO 

I 10 



(b) 

20x20 system, QPSK 




— — CPL rate for causai path metric 
-G— Upper-bound (anai.) 
-B— CPL rate (anal.) 
X CPL rate (simul.) 



10 



15 



20 

SNR 



25 



(c) 



(d) 



Fig. 2. Average CPL probability versus SNR for the (a) 5x5, (b) 10 x 10, (c) 15 x 15, and (d) 20 x 20 systems. QPSK 
uncoded transmission is considered. The curves for the "CPL rate (anal.)" are obtained by Monte-Carlo averaging of 
(|53[) over i.i.d. Gaussian channels. 
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■ 5x5 system 
-e — 10x10 system 
-B — 15x15 system 
-^x— 20x20 system 



25 
SNR 



Fig. 3. Scaling gain versus SNR for different system sizes N = 5, 10, 15, and 20. 
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Fig. 4. Comparison between the causal path metric and LE-LA path metric for the 12 x 12 16-QAM system with 
(a) PI = 4, (b) M = 6, (c) M = 8, and (d) M = 12. 
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TABLE II 

Performance/complexity of 12 x 12 16-QAM system for different M values. 





LE-LA path metric 


Conventional path metric 




SNR (at BER = 1%) 


# of multiplications 


SNR (at BER = 1%) 


# of multiplications 


M = 4 


9.40 dB 


133.77k 


12.50 dB 


120.23k 


M = 6 


9.25 dB 


145.30k 


11.00 dB 


125.50k 


M = 8 


9.22 dB 


157.53k 


10.36 dB 


132.00k 


M = 12 


9.29 dB 


183.12k 


10.11 dB 


145.98k 



TABLE III 

Performance/complexity for different problem sizes. M is set to 6. 





LEl-LA path metric 


Conventional path metric 




SNR (at BER = 1%) 


# of multiplications 


SNR (at BER = 1%) 


# of multiplications 


6x6 16-QAM 


8.80 dB 


24.30k 


9.29 dB 


18.79k 


8x8 16-QAM 


8.97 dB 


50.79k 


9.59 dB 


40.87k 


10 X 10 16-QAM 


9.22 dB 


90.07k 


10.39 dB 


75.19k 


12 X 12 16-QAM 


9.25 dB 


145.30k 


10.11 dB 


125.50k 
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Fig. 5. (a) Performance and (b) complexity of the ISS-MA versus Ni for 12 x 12 16-QAM system (M is set to 8). 
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Fig. 6. Comparison of the ISS-MA and conventional M-algorithm for spatially correlated MIMO channels. 
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Fig. 7. Comparison of the several soft-input soft-output detectors. The numbers in the parenthesis represent the 
parameters of the detectors. 
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